Introduction

Hearthstone is a popular collectible card game published by Blizzard Entertainment in 2014, which is based on the Warcraft series by the same company. The goal of the game is to build a deck of 30 cards and defeat the opponent who also has a deck of 30 cards.

In Hearthstone, cards can be classified according to the following categories:

  • Class: Neutral cards can be used by all nine classes, while Class cards can only be used by the indicated class.
  • Rarity: How often cards can be found when opening card packs.
    • Cards are indicated as Free, Common, Rare, Epic and Legendary, ordered by increasing rarity.
  • Type: Different types of cards have different effects in the game:
    • Minions are played on the game board and can attack Heroes or other minions.
    • Spells are Class abilities that generate a variety of effects on the board.
    • Weapons are items that Heroes can equip to attack Heroes or other minions.
    • Heroes represent the player; the player loses when the Hero’s health reaches 0.
  • Set: Cards are released in designated sets, which are usually based on a certain theme in the Warcraft universe.
    • The core game consists of two sets, Basic and Classic, which are always available in the game.
    • Expansion sets add newer cards to the game, and are released at regular intervals

Approaching the question

Which are the most popular cards used in Ranked decks?

Which are the most popular cards used in Ranked decks?

We focus on the Ranked format where players get to decide which cards to include in their deck, therefore the cards’ popularity are more accurately represented, and the gameplay is not subject to additional constraints that other game modes (like Tavern Brawls and Adventures) may impose.

How do we determine popularity?

We determine popularity by the number of decks that include at least 1 copy in the starting 30 cards (not generated by other effects).

A deck can include at most 2 copies of any card (1 for Legendary cards), thus a card’s popularity is not heavily influenced by the number of copies players wish to use.

Possible biases to consider

  • Decks and cards presented in this data may not necessarily be used by players in the game itself. However, without the means to track actual matches being played in the game itself, we instead use the list of decks submitted by players as an approximation of the decks and cards often used by the playerbase.

  • Since Neutral cards can be used by multiple classes, they should be more popular than Class cards.

  • For the Wild format, cards from the older sets may be more popular simply because they have been in the game longer.

  • For the Standard format, cards from the Basic and Classic sets will be more popular because they do not rotate out of the format unlike expansion cards.

Why address such a question?

If a certain card becomes too popular (i.e. the community thinks players must include it in their decks), it reduces the card variety in the metagame and makes gameplay frustrating for other players (amongst other consequences). In the long term, this may lead to player attrition and loss of potential revenue (when players purchase card packs or other cosmetics).

Historically, Blizzard has dealt with problematic cards in one of several ways:

Dataset Overview

We will use three datasets in this analysis:

  • data.csv contains a list of decks submitted by players to HearthPwn from 2013 (pre-launch) to 2017.
  • refs.json contains detailed information about all cards (collectible and non-collectible) up to March 2017.
  • cards_collectible.json contains detailed information about the cards that are collectible in the game (up to August 2018)

Decks data

The first few rows and columns of the raw data decks_raw is shown below:

craft_cost date deck_archetype deck_class deck_format deck_id deck_set deck_type rating title
9740 2016-02-19 Unknown Priest W 433004 Explorers Tavern Brawl 1 Reno Priest
9840 2016-02-19 Unknown Warrior W 433003 Explorers Ranked Deck 1 RoosterWarrior
2600 2016-02-19 Unknown Mage W 433002 Explorers Theorycraft 1 Annoying
15600 2016-02-19 Unknown Warrior W 433001 Explorers None 0 Standart pay to win warrior
7700 2016-02-19 Unknown Paladin W 432997 Explorers Ranked Deck 1 Palamix
5740 2016-02-19 Unknown Warrior W 432995 Explorers Ranked Deck 2 Kolento’s Elise Control Warrior

The decks_raw data has 346232 rows and 41 columns. The columns craft_cost to user describe the deck’s attributes (like date submitted, class, deck format) while the columns card_0 to card_29 describe the cards using their card IDs. Detailed information on the variables can be found on the Kaggle: History of Hearthstone.

##  [1] "craft_cost"     "date"           "deck_archetype" "deck_class"    
##  [5] "deck_format"    "deck_id"        "deck_set"       "deck_type"     
##  [9] "rating"         "title"          "user"           "card_0"        
## [13] "card_1"         "card_2"         "card_3"         "card_4"        
## [17] "card_5"         "card_6"         "card_7"         "card_8"        
## [21] "card_9"         "card_10"        "card_11"        "card_12"       
## [25] "card_13"        "card_14"        "card_15"        "card_16"       
## [29] "card_17"        "card_18"        "card_19"        "card_20"       
## [33] "card_21"        "card_22"        "card_23"        "card_24"       
## [37] "card_25"        "card_26"        "card_27"        "card_28"       
## [41] "card_29"

There are 8 rows that contain missing data. All the missing values are in the title column, so they can be safely ignored.

date deck_archetype deck_class deck_format deck_id deck_set deck_type rating title
2016-06-22 Unknown Hunter S 576543 Old Gods Theorycraft 1 NA
2014-07-20 Unknown Rogue S 74841 Live Patch 5506 Ranked Deck 0 NA
2013-11-19 Unknown Priest S 17994 Beta Patch 3937 Arena 1 NA
2013-11-03 Unknown Hunter S 15525 Beta Patch 3937 None 1 NA
2015-08-30 Unknown Paladin W 318748 TGT Launch None 1 NA
2015-12-23 Unknown Shaman W 400510 Explorers None 1 NA
2015-12-20 Unknown Shaman W 399274 Explorers None 1 NA
2015-12-20 Unknown Shaman W 399273 Explorers None 1 NA

Collectible Cards data

The first few rows and columns of the cards_raw data is shown below (some columns are shown truncated):

artist cardClass collectible cost dbfId flavor id name rarity set text type
Nutthap… MAGE TRUE 5 2539 It’s on… AT_001 Flame L… COMMON TGT Deal $8… SPELL
Tooth MAGE TRUE 3 2541 Burning… AT_002 Effigy RARE TGT Secr… SPELL
Arthur … MAGE TRUE 2 2545 And he … AT_003 Fallen … RARE TGT Your He… MINION
Gabor S… MAGE TRUE 1 2572 Now wit… AT_004 Arcane … EPIC TGT Deal $2… SPELL
Mike Sass MAGE TRUE 3 2542 It’s al… AT_005 Polymor… RARE TGT Transfo… SPELL
Dan Scott MAGE TRUE 4 2549 Is he a… AT_006 Dalaran… COMMON TGT Insp… MINION

This dataset has 1751 rows and 65 columns. The first 32 columns artist to questReward describe the characteristics of each card.

##  [1] "artist"             "cardClass"          "collectible"       
##  [4] "cost"               "dbfId"              "flavor"            
##  [7] "id"                 "name"               "rarity"            
## [10] "set"                "text"               "type"              
## [13] "mechanics"          "attack"             "health"            
## [16] "referencedTags"     "race"               "elite"             
## [19] "targetingArrowText" "durability"         "overload"          
## [22] "spellDamage"        "armor"              "faction"           
## [25] "howToEarn"          "howToEarnGolden"    "collectionText"    
## [28] "classes"            "multiClassGroup"    "entourage"         
## [31] "hideStats"          "questReward"

Some of the characteristics, including name, Mana cost, race, health and text, can also been seen on the card itself. Some of the mechanics are also highlighted in bold:

The remaining columns are in fact nested under the playRequirements field, which specifies how certain cards can only be played in the game. These requirements are also implicitly/explicitly stated in the card text. (Column names are shown truncated)

##  [1] "...ents.REQ_MINION_TARGET" "...nts.REQ_TARGET_TO_PLAY"
##  [3] "...ments.REQ_ENEMY_TARGET" "...s.REQ_TARGET_WITH_RACE"
##  [5] "..._MINIMUM_ENEMY_MINIONS" "...s.REQ_TARGET_FOR_COMBO"
##  [7] "...ts.REQ_FRIENDLY_TARGET" "...EQ_TARGET_IF_AVAILABLE"
##  [9] "...s.REQ_NUM_MINION_SLOTS" "...NIMUM_FRIENDLY_MINIONS"
## [11] "...ARGET_WITH_DEATHRATTLE" "...OF_RACE_DIED_THIS_GAME"
## [13] "..._MINION_DIED_THIS_GAME" "...s.REQ_LEGENDARY_TARGET"
## [15] "...BLE_AND_DRAGON_IN_HAND" "....REQ_TARGET_MAX_ATTACK"
## [17] "...nts.REQ_NONSELF_TARGET" "...s.REQ_STEALTHED_TARGET"
## [19] "...ements.REQ_HERO_TARGET" "...T_OR_MANA_CRYSTAL_SLOT"
## [21] "...s.REQ_UNDAMAGED_TARGET" "...ts.REQ_WEAPON_EQUIPPED"
## [23] "...nts.REQ_DAMAGED_TARGET" "...EQ_MUST_TARGET_TAUNTER"
## [25] "....REQ_TARGET_MIN_ATTACK" "..._MINIMUM_TOTAL_MINIONS"
## [27] "...ments.REQ_DRAG_TO_PLAY" "...ONE_CAP_FOR_NON_SECRET"
## [29] "...ents.REQ_FROZEN_TARGET" "...NO_3_COST_CARD_IN_DECK"
## [31] "...NIMUM_FRIENDLY_SECRETS" "...s.REQ_CANNOT_PLAY_THIS"
## [33] "...ENTAL_PLAYED_LAST_TURN"

The following card, based on its text, would require certain conditions to be played (a minion on the board, that has not taken any damage):

Additional information on the variables can be found on HearthstoneJSON.

Each card has two unique identifiers: a character/string id and an integer dbfId. The integer dbfId is used across both the deck and card datasets, which can be joined later.

Pre-processing

The following steps were involved in pre-processing the raw data:

Decks dataset

  • Split the raw data into an deck attribute table containing the decks’ properties, and a deck composition table containing the decks’ card IDs, which are joined by deck_id.
  • Exclude decks created before the game launch date (2014-03-11), due to the game undergoing significant changes in the testing phases.
  • Convert categorical variables from character to factor type.
  • Add a hsyear variable that categorizes deck submissions into year-long periods that determine which Hearthstone card sets are eligible for Standard format play.
    • These periods are not based on the calendar year, each hsyear begins with the release of the first expansion card set in that calendar year.
  • Add a hsmonth variable, based on calendar months, categorizing deck submissions into month-long Ranked Seasons. Rankings are reset at the start of each season, giving players opportunities to try out new decks and improve on their previous ranking.
  • For the deck_format column, relabel all decks submitted before 2016-04-26 as “Standard”.
    • Prior to this date, all collectible cards were playable in Ranked games. From the game year 2016 onwards, the game introduced the Standard format (restricted to cards from the Basic and Classic sets, and the last 2 calendar years) and the Wild format (all cards).

A summary of the processed deck attribute data decks_attr is shown below:

##     deck_id         craft_cost         date           
##  Min.   : 36923   Min.   :    0   Min.   :2014-03-11  
##  1st Qu.:253573   1st Qu.: 2840   1st Qu.:2015-05-26  
##  Median :428597   Median : 5120   Median :2016-02-09  
##  Mean   :419989   Mean   : 5745   Mean   :2015-12-21  
##  3rd Qu.:603508   3rd Qu.: 7840   3rd Qu.:2016-08-09  
##  Max.   :749548   Max.   :48000   Max.   :2017-03-19  
##                                                       
##          deck_archetype     deck_class    deck_format
##  Unknown        :220501   Mage   :42230   S:307743   
##  Midrange Shaman:  5472   Priest :41756   W: 16361   
##  Control Priest :  5135   Paladin:39368              
##  Control Warrior:  4939   Warlock:35598              
##  Tempo Mage     :  4545   Druid  :35488              
##  Midrange Hunter:  4371   Shaman :33969              
##  (Other)        : 79141   (Other):95695              
##              deck_set              deck_type          rating        
##  Explorers       : 57307   Arena        :  8178   Min.   :   0.000  
##  Old Gods        : 49895   None         : 75120   1st Qu.:   1.000  
##  Blackrock Launch: 38900   PvE Adventure:  9059   Median :   1.000  
##  Gadgetzan       : 31329   Ranked Deck  :202104   Mean   :   2.777  
##  Naxx Launch     : 22283   Tavern Brawl :  6360   3rd Qu.:   1.000  
##  Yogg Nerf       : 22175   Theorycraft  : 19686   Max.   :4016.000  
##  (Other)         :102215   Tournament   :  3597                     
##     title               user              hsmonth      hsyear      
##  Length:324104      Length:324104      Min.   :2014   2014: 65119  
##  Class :character   Class :character   1st Qu.:2015   2015:128062  
##  Mode  :character   Mode  :character   Median :2016   2016:130923  
##                                        Mean   :2016                
##                                        3rd Qu.:2017                
##                                        Max.   :2017                
## 

Cards dataset

  • Select columns containing attributes that are important for the analysis: this includes the name, ID(s), class, rarity, type and card set.
  • Relabel the values in the set column with the actual names of the card sets.
  • Convert columns that are entirely in UPPERCASE to Title Case for readability
  • Convert categorical variables from character to factor type.

The factor/enumerated columns are then identified and recast accordingly.

A summary of the processed collectible card data cards_simple is shown below:

##      dbfId           name                cost          cardClass  
##  Min.   :    7   Length:1751        Min.   : 0.000   Neutral:657  
##  1st Qu.: 1987   Class :character   1st Qu.: 2.000   Paladin:123  
##  Median :38957   Mode  :character   Median : 4.000   Hunter :122  
##  Mean   :25375                      Mean   : 3.856   Mage   :122  
##  3rd Qu.:43163                      3rd Qu.: 5.000   Warlock:122  
##  Max.   :53187                      Max.   :20.000   Druid  :121  
##                                     NA's   :22       (Other):484  
##        rarity        type      collectible         id           
##  Common   :612   Hero  :  33   Mode:logical   Length:1751       
##  Epic     :298   Minion:1192   TRUE:1751      Class :character  
##  Free     :142   Spell : 471                  Mode  :character  
##  Legendary:253   Weapon:  55                                    
##  Rare     :446                                                  
##                                                                 
##                                                                 
##                          card_set  
##  Classic                     :236  
##  Basic                       :142  
##  Journey to Un'Goro          :135  
##  Knights of the Frozen Throne:135  
##  Kobolds & Catacombs         :135  
##  The Boomsday Project        :135  
##  (Other)                     :833

Mislabelled cards

Card IDs in the processed decks data may be incorrect, due to the following reasons:

  • Deck submissions are generated by user input.
  • Users choose cards based on card names, not card IDs.
  • However, there are cards that share the same name, but have different IDs.
    • Amongst each set of replicates, only one is collectible in-game.

Therefore, we would like to relabel any card IDs that point to non-collectible cards with the respective IDs that point to the collectible version with the same name.

The following steps were taken:

  • From the table containing deck compositions (cards), we compile a list of unique IDs that appear in all the decks in our dataset.
  • Filter out the IDs that is not found in our collectible cards data cards_simple.
  • Using the raw data for all cards cards_all_raw, filter out the cards matching the above IDs.
  • Join this table back to the cards_simple data, based on same names.

The dbfID.x on the left would be replaced by the dbfID.y on the right:

dbfId.x name dbfId.y cost cardClass rarity type collectible id card_set
40341 Cleave 940 2 Warrior Free Spell TRUE CS2_114 Basic
2177 Dark Wispers 2009 6 Druid Epic Spell TRUE GVG_041 Goblins vs Gnomes
42146 Doppelgangster 40953 5 Neutral Rare Minion TRUE CFM_668 Mean Streets of Gadgetzan
38319 Druid of the Claw 692 5 Druid Common Minion TRUE EX1_165 Classic
2230 Druid of the Fang 2048 5 Druid Common Minion TRUE GVG_080 Goblins vs Gnomes
2310 Druid of the Flame 2292 3 Druid Common Minion TRUE BRM_010 Blackrock Mountain

To facilitate relabeling, we create a list of key-values that associate each dbfID.x with the corresponding dbfID.y, which can be used in conjunction with the function recode() that replaces character values by name.

Summary and Reflection

So far, we have looked at cards that users tend to include in decks in Standard format for Ranked play, which is also used for official Hearthstone tournaments - making these popular cards highly visible to a wide audience. We have also looked at card popularity when broken down by various categories, such as class, time period and card set.

Are there any limitations to the data that may have affected our analysis?

The major limitation of this data is that it only looks at decks submitted to a third-party website, which brings up the following issues:

  • Decks submitted may not necessarily be played in the game itself, either because other players think it is too weak, or because the user submits a joke deck that contains absurd combinations of cards and is not meant to be taken seriously.
  • There is no data on how often decks and cards are actually played in the Ranked format.
  • Likewise, there is no information on how effective the decks are at winning games. While the rating attribute may reflect how strong other players consider a deck, it is also biased towards the popularity of the user as well as the date of submission:
    • A deck may be initially strong and highly rated, but as new cards are introduced and old cards are removed from Standard format, the deck may wane in strength, but users are unlikely to retract their votes by this point in time.

How can we expand on this analysis?

  • Examine popular combination of cards that complement each other well.
  • Examine whether the crafting cost (in dust) of decks has any relation to its popularity (rating).